23 research outputs found

    From cheek swabs to consensus sequences : an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes

    Get PDF
    Background: Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results: Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions: All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources

    Neolithic Mitochondrial Haplogroup H Genomes and the Genetic Origins of Europeans

    Get PDF
    Haplogroup H dominates present-day Western European mitochondrial DNA variability (\u3e40%), yet was less common (~19%) among Early Neolithic farmers (~5450 BC) and virtually absent in Mesolithic hunter-gatherers. Here we investigate this major component of the maternal population history of modern Europeans and sequence 39 complete haplogroup H mitochondrial genomes from ancient human remains. We then compare this ‘real-time’ genetic data with cultural changes taking place between the Early Neolithic (~5450 BC) and Bronze Age (~2200 BC) in Central Europe. Our results reveal that the current diversity and distribution of haplogroup H were largely established by the Mid Neolithic (~4000 BC), but with substantial genetic contributions from subsequent pan-European cultures such as the Bell Beakers expanding out of Iberia in the Late Neolithic (~2800 BC). Dated haplogroup H genomes allow us to reconstruct the recent evolutionary history of haplogroup H and reveal a mutation rate 45% higher than current estimates for human mitochondria

    Population differentiation of Southern Indian male lineages correlates with agricultural expansions predating the caste system

    Get PDF
    Christina J. Adler, Alan Cooper, Clio S.I. Der Sarkissian and Wolfgang Haak are contributors to the Genographic ConsortiumPrevious studies that pooled Indian populations from a wide variety of geographical locations, have obtained contradictory conclusions about the processes of the establishment of the Varna caste system and its genetic impact on the origins and demographic histories of Indian populations. To further investigate these questions we took advantage that both Y chromosome and caste designation are paternally inherited, and genotyped 1,680 Y chromosomes representing 12 tribal and 19 non-tribal (caste) endogamous populations from the predominantly Dravidian-speaking Tamil Nadu state in the southernmost part of India. Tribes and castes were both characterized by an overwhelming proportion of putatively Indian autochthonous Y-chromosomal haplogroups (H-M69, F-M89, R1a1-M17, L1-M27, R2-M124, and C5-M356; 81% combined) with a shared genetic heritage dating back to the late Pleistocene (10–30 Kya), suggesting that more recent Holocene migrations from western Eurasia contributed, <20% of the male lineages. We found strong evidence for genetic structure, associated primarily with the current mode of subsistence. Coalescence analysis suggested that the social stratification was established 4–6 Kya and there was little admixture during the last 3 Kya, implying a minimal genetic impact of the Varna(caste) system from the historically-documented Brahmin migrations into the area. In contrast, the overall Y-chromosomal patterns, the time depth of population diversifications and the period of differentiation were best explained by the emergence of agricultural technology in South Asia. These results highlight the utility of detailed local genetic studies within India, without prior assumptions about the importance of Varna rank status for population grouping, to obtain new insights into the relative influences of past demographic events for the population structure of the whole of modern India.GaneshPrasad ArunKumar, David F. Soria-Hernanz, Valampuri John Kavitha, Varatharajan Santhakumari Arun, Adhikarla Syama, Kumaran Samy Ashokan, Kavandanpatti Thangaraj Gandhirajan, Koothapuli Vijayakumar, Muthuswamy Narayanan, Mariakuttikan Jayalakshmi, Janet S. Ziegle, Ajay K. Royyuru, Laxmi Parida, R. Spencer Wells, Colin Renfrew, Theodore G. Schurr, Chris Tyler Smith, Daniel E. Platt, Ramasamy Pitchappan, The Genographic Consortiu

    Systematic and automated discovery of patterns in PROSITE families

    No full text
    PROSITE is a method for protein classification which relies on a database of biologically significant sites and patterns in protein sequences. Most patterns in PROSITE have been gathered by a a labor intensive combination of experimental characterization of functional residues and sequence alignment. In this paper we present a new and efficient supervised learning procedure, based on the Splash deterministic pattern discovery algorithm and on a framework to assess the statistical significance of patterns. We demonstrate its application to the fully automatic discovery of patterns in 974 PROSITE families. For these families, Splash generates patterns with better specificity and/or sensitivity in 28%, identical statistics in 48%, and worse statistics in 15 % of the cases; for the remaining families, patterns exhibited mixed behavior. Second, we have characterized the amount of overlap, on the sequences, between newly discovered patterns and those in PROSITE. In about 75 % of the cases, Splash patterns identify sequence sites that overlap more than 50 % with those reported in PROSITE. Of the 272 patterns which perform strictly better than the corresponding PROSITE pattern, 178 show more than 70% overlap with the PROSITE pattern. Third, our results suggest that the statistical significance of discovered patterns correlates well with their biological significance. Finally, we use the trypsin subfamily of serine proteases to illustrate the use of this method to exhaustively discover all motifs in a family that are statistically and biologically significant. The complete analysis is sufficiently rapid, taking less than a day for all PROSITE families, to enable the use this methodology for routine curation of existing motif and profile databases.

    Inferring Common Origins From Mtdna

    No full text
    The history of human migratory events can be inferred from observed variations in DNA sequences. Such studies on non-recombinant mtDNA and Y-chromosome show that present day humans outside Africa originated from one or more migrations of small groups of individuals between 30K-70K YBP. Coalescence theory reveals that, any collection of non-recombinant DNA sequences can be traced back to a common ancestor. Mutations fixed by genetic drift act as markers on the timeline from the common ancestor to the present and can be used to infer migration and founder events that occurred in ancestral populations. However, most mutations seen in the data today are relatively recent and do not carry useful information about deep ancestry. The only ones that can be used reliably are those that can be shown to robustly distinguish large clusters of individuals and thus qualify as true representatives of population events in the past. In this talk, we present results from the analysis of 1737 complete mtDNA sequences from public databases to infer such a robust set of mutations that reveal the haplogroup phylogeny. Using principal component analysis we identify the samples in L, M and N clades and with unsupervised consensus ensemble clustering we infer the substructure in these clades. Traditional methods are inadequate to handle data of this size and complexity. The substructure is inferred using a new algorithm that mitigates the usual problems of sample size bias within haplogroups as well as the sampling bias across haplogroups. First, we cluster the data in each of the M, N, L clades separately into k = 2, 3, 4,... k max groups using an agreement matrix derived from multiple clustering techniques and bootstrap sampling. Repeated training/test splits of the samples identify robust clusters and patterns of SNPs which can assign haplogroup labels with a reliability greater than 90%. Even though the clustering at each k is done independently, the clusters split in a way that suggests that the data is revealing population events; a cluster at level k has k -2 clusters which are identical with those at level k -1 plus two more that obtain from a split of one of the clusters at level k -1. The clustering is repeated with equal number of samples from the first level clusters. The sequence in which the clusters now split defines a binary network which reveals population events unbiased by sample size. We root the network using an out-group and, assuming a molecular clock, identify an internal node in the bifurcation process which is equidistant from the leaves. This rooting removes the bias across haplogroups which would otherwise influence the order in which the clusters emerge. Our analysis shows that the African clades L0/L1, L2 and L3 have the greatest heterogeneity of SNPs, in agreement with their ancient ancestry. It also suggests that the M, N clades originated from a common ancestor of L3 in two separate migrations. The first migration gave rise to the M haplogroup, whose descendents currently populate South-East Asia and Australia. The second migration resulted in the N haplogroup, accounting for the current populations in China, Japan, Europe, Central Asia and North and South America. We reveal and robustly label many branches of the mtDNA tree, improving current results significantly. We find that for our choice of robust SNPs, the genetic distances between the NA and NRB haplogroups is smaller compared to that between B and J/T/H/V/U. The detailed N migratory sub-tree is rooted so that the T, J and U haplogroups are on one side of the root and the F, V/H, I, X, R5, B, N9, A and W are on the other. We also find a detailed structure for the M tree consistent with prior literature and we infer additional branches for the MD haplogroup. Finally we provide detailed SNP patterns for each haplogroup identified by our clustering. Our patterns can be used to infer a haplogroup assignment with reliability greater than 90%. © Springer-Verlag Berlin Heidelberg 2006

    Maximum-Likelihood Estimation of Site-Specific Mutation Rates in Human Mitochondrial DNA From Partial Phylogenetic Classification

    No full text
    The mitochondrial DNA hypervariable segment I (HVS-I) is widely used in studies of human evolutionary genetics, and therefore accurate estimates of mutation rates among nucleotide sites in this region are essential. We have developed a novel maximum-likelihood methodology for estimating site-specific mutation rates from partial phylogenetic information, such as haplogroup association. The resulting estimation problem is a generalized linear model, with a nonstandard link function. We develop inference and bias correction tools for our estimates and a hypothesis-testing approach for site independence. We demonstrate our methodology using 16,609 HVS-I samples from the Genographic Project. Our results suggest that mutation rates among nucleotide sites in HVS-I are highly variable. The 16,400–16,500 region exhibits significantly lower rates compared to other regions, suggesting potential functional constraints. Several loci identified in the literature as possible termination-associated sequences (TAS) do not yield statistically slower rates than the rest of HVS-I, casting doubt on their functional importance. Our tests do not reject the null hypothesis of independent mutation rates among nucleotide sites, supporting the use of site-independence assumption for analyzing HVS-I. Potential extensions of our methodology include its application to estimation of mutation rates in other genetic regions, like Y chromosome short tandem repeats
    corecore